{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "基于评分卡的风控模型开发  \n",
    "  \n",
    "数据集GiveMeSomeCredit，15万样本数据  \n",
    "https://www.kaggle.com/c/GiveMeSomeCredit/data \n",
    "使用WOE进行特征变换，IV进行特征筛选，LR构建风控模型，并对模型评分规则进行可解释性说明   \n",
    "- 基本属性：包括了借款人当时的年龄\n",
    "- 偿债能力：包括了借款人的月收入、负债比率\n",
    "- 信用往来：两年内35-59天逾期次数、两年内60-89天逾期次数、两年内90天或高于90天逾期的次数\n",
    "- 财产状况：包括了开放式信贷和贷款数量、不动产贷款或额度数量。\n",
    "- 其他因素：包括了借款人的家属数量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-12T02:57:06.924126Z",
     "start_time": "2020-12-12T02:57:06.918126Z"
    }
   },
   "source": [
    "## 数据预处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:56:58.565930Z",
     "start_time": "2020-12-13T06:56:56.637820Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:56:59.039958Z",
     "start_time": "2020-12-13T06:56:58.569931Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>SeriousDlqin2yrs</th>\n",
       "      <th>RevolvingUtilizationOfUnsecuredLines</th>\n",
       "      <th>age</th>\n",
       "      <th>NumberOfTime30-59DaysPastDueNotWorse</th>\n",
       "      <th>DebtRatio</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "      <th>NumberOfOpenCreditLinesAndLoans</th>\n",
       "      <th>NumberOfTimes90DaysLate</th>\n",
       "      <th>NumberRealEstateLoansOrLines</th>\n",
       "      <th>NumberOfTime60-89DaysPastDueNotWorse</th>\n",
       "      <th>NumberOfDependents</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.766127</td>\n",
       "      <td>45</td>\n",
       "      <td>2</td>\n",
       "      <td>0.802982</td>\n",
       "      <td>9120.0</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0.957151</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>0.121876</td>\n",
       "      <td>2600.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.658180</td>\n",
       "      <td>38</td>\n",
       "      <td>1</td>\n",
       "      <td>0.085113</td>\n",
       "      <td>3042.0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0.233810</td>\n",
       "      <td>30</td>\n",
       "      <td>0</td>\n",
       "      <td>0.036050</td>\n",
       "      <td>3300.0</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0.907239</td>\n",
       "      <td>49</td>\n",
       "      <td>1</td>\n",
       "      <td>0.024926</td>\n",
       "      <td>63588.0</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0.213179</td>\n",
       "      <td>74</td>\n",
       "      <td>0</td>\n",
       "      <td>0.375607</td>\n",
       "      <td>3500.0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>0.305682</td>\n",
       "      <td>57</td>\n",
       "      <td>0</td>\n",
       "      <td>5710.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>0.754464</td>\n",
       "      <td>39</td>\n",
       "      <td>0</td>\n",
       "      <td>0.209940</td>\n",
       "      <td>3500.0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>0.116951</td>\n",
       "      <td>27</td>\n",
       "      <td>0</td>\n",
       "      <td>46.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>0</td>\n",
       "      <td>0.189169</td>\n",
       "      <td>57</td>\n",
       "      <td>0</td>\n",
       "      <td>0.606291</td>\n",
       "      <td>23684.0</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0  SeriousDlqin2yrs  RevolvingUtilizationOfUnsecuredLines  age  \\\n",
       "0           1                 1                              0.766127   45   \n",
       "1           2                 0                              0.957151   40   \n",
       "2           3                 0                              0.658180   38   \n",
       "3           4                 0                              0.233810   30   \n",
       "4           5                 0                              0.907239   49   \n",
       "5           6                 0                              0.213179   74   \n",
       "6           7                 0                              0.305682   57   \n",
       "7           8                 0                              0.754464   39   \n",
       "8           9                 0                              0.116951   27   \n",
       "9          10                 0                              0.189169   57   \n",
       "\n",
       "   NumberOfTime30-59DaysPastDueNotWorse    DebtRatio  MonthlyIncome  \\\n",
       "0                                     2     0.802982         9120.0   \n",
       "1                                     0     0.121876         2600.0   \n",
       "2                                     1     0.085113         3042.0   \n",
       "3                                     0     0.036050         3300.0   \n",
       "4                                     1     0.024926        63588.0   \n",
       "5                                     0     0.375607         3500.0   \n",
       "6                                     0  5710.000000            NaN   \n",
       "7                                     0     0.209940         3500.0   \n",
       "8                                     0    46.000000            NaN   \n",
       "9                                     0     0.606291        23684.0   \n",
       "\n",
       "   NumberOfOpenCreditLinesAndLoans  NumberOfTimes90DaysLate  \\\n",
       "0                               13                        0   \n",
       "1                                4                        0   \n",
       "2                                2                        1   \n",
       "3                                5                        0   \n",
       "4                                7                        0   \n",
       "5                                3                        0   \n",
       "6                                8                        0   \n",
       "7                                8                        0   \n",
       "8                                2                        0   \n",
       "9                                9                        0   \n",
       "\n",
       "   NumberRealEstateLoansOrLines  NumberOfTime60-89DaysPastDueNotWorse  \\\n",
       "0                             6                                     0   \n",
       "1                             0                                     0   \n",
       "2                             0                                     0   \n",
       "3                             0                                     0   \n",
       "4                             1                                     0   \n",
       "5                             1                                     0   \n",
       "6                             3                                     0   \n",
       "7                             0                                     0   \n",
       "8                             0                                     0   \n",
       "9                             4                                     0   \n",
       "\n",
       "   NumberOfDependents  \n",
       "0                 2.0  \n",
       "1                 1.0  \n",
       "2                 0.0  \n",
       "3                 0.0  \n",
       "4                 0.0  \n",
       "5                 1.0  \n",
       "6                 0.0  \n",
       "7                 0.0  \n",
       "8                 NaN  \n",
       "9                 2.0  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv('./cs-training.csv')\n",
    "data.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:56:59.143964Z",
     "start_time": "2020-12-13T06:56:59.047958Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SeriousDlqin2yrs</th>\n",
       "      <th>RevolvingUtilizationOfUnsecuredLines</th>\n",
       "      <th>age</th>\n",
       "      <th>NumberOfTime30-59DaysPastDueNotWorse</th>\n",
       "      <th>DebtRatio</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "      <th>NumberOfOpenCreditLinesAndLoans</th>\n",
       "      <th>NumberOfTimes90DaysLate</th>\n",
       "      <th>NumberRealEstateLoansOrLines</th>\n",
       "      <th>NumberOfTime60-89DaysPastDueNotWorse</th>\n",
       "      <th>NumberOfDependents</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.766127</td>\n",
       "      <td>45</td>\n",
       "      <td>2</td>\n",
       "      <td>0.802982</td>\n",
       "      <td>9120.0</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0.957151</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>0.121876</td>\n",
       "      <td>2600.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0.658180</td>\n",
       "      <td>38</td>\n",
       "      <td>1</td>\n",
       "      <td>0.085113</td>\n",
       "      <td>3042.0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0.233810</td>\n",
       "      <td>30</td>\n",
       "      <td>0</td>\n",
       "      <td>0.036050</td>\n",
       "      <td>3300.0</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0.907239</td>\n",
       "      <td>49</td>\n",
       "      <td>1</td>\n",
       "      <td>0.024926</td>\n",
       "      <td>63588.0</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149995</th>\n",
       "      <td>0</td>\n",
       "      <td>0.040674</td>\n",
       "      <td>74</td>\n",
       "      <td>0</td>\n",
       "      <td>0.225131</td>\n",
       "      <td>2100.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149996</th>\n",
       "      <td>0</td>\n",
       "      <td>0.299745</td>\n",
       "      <td>44</td>\n",
       "      <td>0</td>\n",
       "      <td>0.716562</td>\n",
       "      <td>5584.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149997</th>\n",
       "      <td>0</td>\n",
       "      <td>0.246044</td>\n",
       "      <td>58</td>\n",
       "      <td>0</td>\n",
       "      <td>3870.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149998</th>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>30</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>5716.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149999</th>\n",
       "      <td>0</td>\n",
       "      <td>0.850283</td>\n",
       "      <td>64</td>\n",
       "      <td>0</td>\n",
       "      <td>0.249908</td>\n",
       "      <td>8158.0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>150000 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        SeriousDlqin2yrs  RevolvingUtilizationOfUnsecuredLines  age  \\\n",
       "0                      1                              0.766127   45   \n",
       "1                      0                              0.957151   40   \n",
       "2                      0                              0.658180   38   \n",
       "3                      0                              0.233810   30   \n",
       "4                      0                              0.907239   49   \n",
       "...                  ...                                   ...  ...   \n",
       "149995                 0                              0.040674   74   \n",
       "149996                 0                              0.299745   44   \n",
       "149997                 0                              0.246044   58   \n",
       "149998                 0                              0.000000   30   \n",
       "149999                 0                              0.850283   64   \n",
       "\n",
       "        NumberOfTime30-59DaysPastDueNotWorse    DebtRatio  MonthlyIncome  \\\n",
       "0                                          2     0.802982         9120.0   \n",
       "1                                          0     0.121876         2600.0   \n",
       "2                                          1     0.085113         3042.0   \n",
       "3                                          0     0.036050         3300.0   \n",
       "4                                          1     0.024926        63588.0   \n",
       "...                                      ...          ...            ...   \n",
       "149995                                     0     0.225131         2100.0   \n",
       "149996                                     0     0.716562         5584.0   \n",
       "149997                                     0  3870.000000            NaN   \n",
       "149998                                     0     0.000000         5716.0   \n",
       "149999                                     0     0.249908         8158.0   \n",
       "\n",
       "        NumberOfOpenCreditLinesAndLoans  NumberOfTimes90DaysLate  \\\n",
       "0                                    13                        0   \n",
       "1                                     4                        0   \n",
       "2                                     2                        1   \n",
       "3                                     5                        0   \n",
       "4                                     7                        0   \n",
       "...                                 ...                      ...   \n",
       "149995                                4                        0   \n",
       "149996                                4                        0   \n",
       "149997                               18                        0   \n",
       "149998                                4                        0   \n",
       "149999                                8                        0   \n",
       "\n",
       "        NumberRealEstateLoansOrLines  NumberOfTime60-89DaysPastDueNotWorse  \\\n",
       "0                                  6                                     0   \n",
       "1                                  0                                     0   \n",
       "2                                  0                                     0   \n",
       "3                                  0                                     0   \n",
       "4                                  1                                     0   \n",
       "...                              ...                                   ...   \n",
       "149995                             1                                     0   \n",
       "149996                             1                                     0   \n",
       "149997                             1                                     0   \n",
       "149998                             0                                     0   \n",
       "149999                             2                                     0   \n",
       "\n",
       "        NumberOfDependents  \n",
       "0                      2.0  \n",
       "1                      1.0  \n",
       "2                      0.0  \n",
       "3                      0.0  \n",
       "4                      0.0  \n",
       "...                    ...  \n",
       "149995                 0.0  \n",
       "149996                 2.0  \n",
       "149997                 0.0  \n",
       "149998                 0.0  \n",
       "149999                 0.0  \n",
       "\n",
       "[150000 rows x 11 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train = data.iloc[:,1:]\n",
    "df_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:56:59.315973Z",
     "start_time": "2020-12-13T06:56:59.150964Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>字段</th>\n",
       "      <th>说明</th>\n",
       "      <th>类型</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>SeriousDlqin2yrs</td>\n",
       "      <td>90天以上逾期或更差</td>\n",
       "      <td>Y/N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Age</td>\n",
       "      <td>年龄</td>\n",
       "      <td>整数</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>RevolvingUtilizationOfUnsecuredLines</td>\n",
       "      <td>除房地产和汽车贷款等无分期付款债务外，信用卡和个人信用额度的总余额除以信贷限额</td>\n",
       "      <td>百分比</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DebtRatio</td>\n",
       "      <td>债务比（每月偿还的债务，赡养费，生活费除以每月的总收入）</td>\n",
       "      <td>百分比</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MonthlyIncome</td>\n",
       "      <td>每月收入</td>\n",
       "      <td>实数</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>NumberOfOpenCreditLinesAndLoans</td>\n",
       "      <td>公开贷款(如汽车贷款或抵押贷款)和信用额度(如信用卡)的数量</td>\n",
       "      <td>整数</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NumberRealEstateLoansOrLines</td>\n",
       "      <td>抵押贷款和房地产贷款的额度（包括房屋净值信贷）</td>\n",
       "      <td>整数</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>借款人逾期30-59天的次数，但在过去两年没有更糟</td>\n",
       "      <td>整数</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>借款人逾期60-89天的次数，但在过去两年没有更糟</td>\n",
       "      <td>整数</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>借款人逾期90天（或以上）的次数</td>\n",
       "      <td>整数</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>NumberOfDependents</td>\n",
       "      <td>除自己(配偶、子女等)以外的家庭受养人人数</td>\n",
       "      <td>整数</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      字段  \\\n",
       "0                       SeriousDlqin2yrs   \n",
       "1                                    Age   \n",
       "2   RevolvingUtilizationOfUnsecuredLines   \n",
       "3                              DebtRatio   \n",
       "4                          MonthlyIncome   \n",
       "5        NumberOfOpenCreditLinesAndLoans   \n",
       "6           NumberRealEstateLoansOrLines   \n",
       "7   NumberOfTime30-59DaysPastDueNotWorse   \n",
       "8   NumberOfTime60-89DaysPastDueNotWorse   \n",
       "9                NumberOfTimes90DaysLate   \n",
       "10                    NumberOfDependents   \n",
       "\n",
       "                                         说明   类型  \n",
       "0                                90天以上逾期或更差  Y/N  \n",
       "1                                        年龄   整数  \n",
       "2   除房地产和汽车贷款等无分期付款债务外，信用卡和个人信用额度的总余额除以信贷限额  百分比  \n",
       "3              债务比（每月偿还的债务，赡养费，生活费除以每月的总收入）  百分比  \n",
       "4                                      每月收入   实数  \n",
       "5            公开贷款(如汽车贷款或抵押贷款)和信用额度(如信用卡)的数量   整数  \n",
       "6                   抵押贷款和房地产贷款的额度（包括房屋净值信贷）   整数  \n",
       "7                 借款人逾期30-59天的次数，但在过去两年没有更糟   整数  \n",
       "8                借款人逾期60-89天的次数，但在过去两年没有更糟    整数  \n",
       "9                          借款人逾期90天（或以上）的次数   整数  \n",
       "10                    除自己(配偶、子女等)以外的家庭受养人人数   整数  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "field_desc = pd.read_excel('./field_desc.xlsx')\n",
    "field_desc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:56:59.346975Z",
     "start_time": "2020-12-13T06:56:59.320974Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    139974\n",
       "1     10026\n",
       "Name: SeriousDlqin2yrs, dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['SeriousDlqin2yrs'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:03.851233Z",
     "start_time": "2020-12-13T06:56:59.354976Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x1c452be0>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#对违约率进行可视化,1的情况比较少\n",
    "import seaborn as sns\n",
    "\n",
    "sns.countplot(x='SeriousDlqin2yrs',data=df_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:03.873234Z",
     "start_time": "2020-12-13T06:57:03.858233Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.06684"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['SeriousDlqin2yrs'].sum() / df_train.shape[0]\n",
    "\n",
    "#违约率为%6"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:03.910236Z",
     "start_time": "2020-12-13T06:57:03.880234Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "SeriousDlqin2yrs                            0\n",
       "RevolvingUtilizationOfUnsecuredLines        0\n",
       "age                                         0\n",
       "NumberOfTime30-59DaysPastDueNotWorse        0\n",
       "DebtRatio                                   0\n",
       "MonthlyIncome                           29731\n",
       "NumberOfOpenCreditLinesAndLoans             0\n",
       "NumberOfTimes90DaysLate                     0\n",
       "NumberRealEstateLoansOrLines                0\n",
       "NumberOfTime60-89DaysPastDueNotWorse        0\n",
       "NumberOfDependents                       3924\n",
       "dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#查看有无缺失值\n",
    "df_train.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:03.959239Z",
     "start_time": "2020-12-13T06:57:03.917237Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    150000.000000\n",
       "mean          6.048438\n",
       "std         249.755371\n",
       "min           0.000000\n",
       "25%           0.029867\n",
       "50%           0.154181\n",
       "75%           0.559046\n",
       "max       50708.000000\n",
       "Name: RevolvingUtilizationOfUnsecuredLines, dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['RevolvingUtilizationOfUnsecuredLines'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:04.962296Z",
     "start_time": "2020-12-13T06:57:03.965239Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x1c629a20>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEGCAYAAACO8lkDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAZBElEQVR4nO3dfbRddX3n8ffHRMCngkikDA8NdtLpitYyeIvgw0ixVUJrY0dRsA6RYYYyI8uHGVcbpuOMdrUd1I5aFGGYDgq2FGjVEgVFViy2SwUJAoGgKREVsmABPjRKUTH4nT/2L+Rw9sm9OzchDzfv11pnnX1++/fbZ39zbu7n7L3P+d1UFZIkjXrCzt4BSdKux3CQJPUYDpKkHsNBktRjOEiSeubv7B3YHg444IBauHDhzt4NSdqt3Hjjjd+uqgWT1s2JcFi4cCGrVq3a2bshSbuVJN/a0jpPK0mSegwHSVLPoHBIcnyStUnWJVk+YX2SnNPWr05y5Exjk5yYZE2SnyaZGtveWa3/2iQv35YCJUlbb8ZwSDIPOBdYAiwGTk6yeKzbEmBRu50OnDdg7G3AvwX+fuz5FgMnAc8Gjgc+1LYjSdpBhhw5HAWsq6o7q+ph4FJg6VifpcDF1bkO2C/JQdONraqvVtXaCc+3FLi0qn5cVd8A1rXtSJJ2kCHhcDBw98jj9a1tSJ8hY2fzfJKkx9GQcMiEtvGpXLfUZ8jY2TwfSU5PsirJqgceeGCGTUqStsaQcFgPHDry+BDgnoF9hoydzfNRVRdU1VRVTS1YMPE7HJKkWRoSDjcAi5IcnmQvuovFK8b6rABOaZ9aOhrYUFX3Dhw7bgVwUpK9kxxOd5H7y1tRkyRpG834Demq2pjkTOBqYB5wYVWtSXJGW38+cBVwAt3F44eAU6cbC5Dkt4EPAAuAK5PcXFUvb9u+HLgd2Ai8saoe2a5Vj7nk+rsmtr/u+Yc9nk8rSbuszIW/BDc1NVXbMn2G4SBpT5TkxqqamrTOb0hLknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6BoVDkuOTrE2yLsnyCeuT5Jy2fnWSI2cam2T/JNckuaPdP721PzHJRUluTfLVJGdtj0IlScPNGA5J5gHnAkuAxcDJSRaPdVsCLGq304HzBoxdDqysqkXAyvYY4ERg76r6JeB5wO8mWTjL+iRJszDkyOEoYF1V3VlVDwOXAkvH+iwFLq7OdcB+SQ6aYexS4KK2fBHwyrZcwFOSzAeeBDwMfH925UmSZmNIOBwM3D3yeH1rG9JnurEHVtW9AO3+ma39b4B/Bu4F7gL+tKq+O2A/JUnbyZBwyIS2GthnyNhxRwGPAP8COBz4r0me1dup5PQkq5KseuCBB2bYpCRpawwJh/XAoSOPDwHuGdhnurH3tVNPtPv7W/vrgM9U1U+q6n7gC8DU+E5V1QVVNVVVUwsWLBhQhiRpqCHhcAOwKMnhSfYCTgJWjPVZAZzSPrV0NLChnSqabuwKYFlbXgZc0ZbvAo5r23oKcDTwtVnWJ0mahfkzdaiqjUnOBK4G5gEXVtWaJGe09ecDVwEnAOuAh4BTpxvbNn02cHmS0+gC4cTWfi7wYeA2utNSH66q1dujWEnSMDOGA0BVXUUXAKNt548sF/DGoWNb+3eAl05of5DNQSFJ2gn8hrQkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKlnUDgkOT7J2iTrkiyfsD5JzmnrVyc5cqaxSfZPck2SO9r900fWPTfJl5KsSXJrkn22tVBJ0nAzhkOSecC5wBJgMXByksVj3ZYAi9rtdOC8AWOXAyurahGwsj0myXzgL4AzqurZwLHAT2ZfoiRpaw05cjgKWFdVd1bVw8ClwNKxPkuBi6tzHbBfkoNmGLsUuKgtXwS8si2/DFhdVbcAVNV3quqRWdYnSZqFIeFwMHD3yOP1rW1In+nGHlhV9wK0+2e29l8AKsnVSb6S5PeGFCJJ2n7mD+iTCW01sM+QsZP26UXArwAPASuT3FhVKx/zhMnpdKewOOyww2bYpCRpaww5clgPHDry+BDgnoF9pht7Xzv1RLu/f2Rbn6+qb1fVQ8BVwJGMqaoLqmqqqqYWLFgwoAxJ0lBDwuEGYFGSw5PsBZwErBjrswI4pX1q6WhgQztVNN3YFcCytrwMuKItXw08N8mT28XplwC3z7I+SdIszHhaqao2JjmT7pf2PODCqlqT5Iy2/ny6d/cnAOvoTgWdOt3YtumzgcuTnAbcBZzYxnwvyXvpgqWAq6rqyu1VsCRpZqma6RLArm9qaqpWrVo16/GXXH/XxPbXPd9rGZLmrnY9d2rSOr8hLUnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9g8IhyfFJ1iZZl2T5hPVJck5bvzrJkTONTbJ/kmuS3NHunz62zcOSPJjkbdtSoCRp680YDknmAecCS4DFwMlJFo91WwIsarfTgfMGjF0OrKyqRcDK9njU+4BPz6ImSdI2GnLkcBSwrqrurKqHgUuBpWN9lgIXV+c6YL8kB80wdilwUVu+CHjlpo0leSVwJ7BmlnVJkrbBkHA4GLh75PH61jakz3RjD6yqewHa/TMBkjwF+H3gndPtVJLTk6xKsuqBBx4YUIYkaagh4ZAJbTWwz5Cx494JvK+qHpyuU1VdUFVTVTW1YMGCGTYpSdoa8wf0WQ8cOvL4EOCegX32mmbsfUkOqqp72ymo+1v784FXJ3k3sB/w0yQ/qqoPDilIkrTthhw53AAsSnJ4kr2Ak4AVY31WAKe0Ty0dDWxop4qmG7sCWNaWlwFXAFTVi6tqYVUtBN4P/InBIEk71oxHDlW1McmZwNXAPODCqlqT5Iy2/nzgKuAEYB3wEHDqdGPbps8GLk9yGnAXcOJ2rUySNGtDTitRVVfRBcBo2/kjywW8cejY1v4d4KUzPO87huyfJGn78hvSkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKknkHhkOT4JGuTrEuyfML6JDmnrV+d5MiZxibZP8k1Se5o909v7b+e5MYkt7b747ZHoZKk4WYMhyTzgHOBJcBi4OQki8e6LQEWtdvpwHkDxi4HVlbVImBlewzwbeAVVfVLwDLgo7OuTpI0K0OOHI4C1lXVnVX1MHApsHSsz1Lg4upcB+yX5KAZxi4FLmrLFwGvBKiqm6rqnta+Btgnyd6zrE+SNAtDwuFg4O6Rx+tb25A+0409sKruBWj3z5zw3K8CbqqqH4+vSHJ6klVJVj3wwAMDypAkDTUkHDKhrQb2GTJ28pMmzwbeBfzupPVVdUFVTVXV1IIFC4ZsUpI00JBwWA8cOvL4EOCegX2mG3tfO/VEu79/U6ckhwCfAE6pqq8P2EdJ0nY0JBxuABYlOTzJXsBJwIqxPiuAU9qnlo4GNrRTRdONXUF3wZl2fwVAkv2AK4GzquoL21CbJGmW5s/Uoao2JjkTuBqYB1xYVWuSnNHWnw9cBZwArAMeAk6dbmzb9NnA5UlOA+4CTmztZwL/Enh7kre3tpdV1aNHFpKkx1eqBl0C2KVNTU3VqlWrZj3+kuvvmtj+uucfNuttStKuLsmNVTU1aZ3fkJYk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUMCockxydZm2RdkuUT1ifJOW396iRHzjQ2yf5JrklyR7t/+si6s1r/tUlevq1FSpK2zozhkGQecC6wBFgMnJxk8Vi3JcCidjsdOG/A2OXAyqpaBKxsj2nrTwKeDRwPfKhtR5K0g8wf0OcoYF1V3QmQ5FJgKXD7SJ+lwMVVVcB1SfZLchCwcJqxS4Fj2/iLgGuB32/tl1bVj4FvJFnX9uFLsy9zslvXb+C1F3yJjY/UxPV/dOXtE9vniuzsHdhOJr960p7h+Of8LO99zRHbfbtDwuFg4O6Rx+uB5w/oc/AMYw+sqnsBqureJM8c2dZ1E7b1GElOpztKAXgwydoBtWzJAcC3t2H87sZ65zbrndseU+9Xgfe9dtbb+rktrRgSDpPeYI6/WdtSnyFjZ/N8VNUFwAUzbGuQJKuqamp7bGt3YL1zm/XObTuq3iEXpNcDh448PgS4Z2Cf6cbe10490e7v34rnkyQ9joaEww3AoiSHJ9mL7mLxirE+K4BT2qeWjgY2tFNG041dASxry8uAK0baT0qyd5LD6S5yf3mW9UmSZmHG00pVtTHJmcDVwDzgwqpak+SMtv584CrgBGAd8BBw6nRj26bPBi5PchpwF3BiG7MmyeV0F603Am+sqke2V8FbsF1OT+1GrHdus965bYfUm+4DRpIkbeY3pCVJPYaDJKlnjw6HmaYF2ZUluTDJ/UluG2nb6ilJkjwvya1t3TlJ0tr3TnJZa78+ycIdWd+oJIcm+bskX02yJsmbW/tcrXefJF9Ockur952tfU7Wu0mSeUluSvKp9niu1/vNtq83J1nV2nadmqtqj7zRXSD/OvAsYC/gFmDxzt6vrdj/fwMcCdw20vZuYHlbXg68qy0vbvXtDRze6p7X1n0ZOIbu+yWfBpa09v8MnN+WTwIu24m1HgQc2ZafBvxjq2mu1hvgqW35icD1wNFztd6Ruv8LcAnwqbn88zxS7zeBA8badpmad+o/zk5+YY4Brh55fBZw1s7er62sYSGPDYe1wEFt+SBg7aTa6D49dkzr87WR9pOB/zPapy3Pp/tGZnZ2zW1/rgB+fU+oF3gy8BW6mQXmbL1032daCRzH5nCYs/W2/fgm/XDYZWrek08rbWnKj93ZY6YkAUanJNnS9CbrJ7Q/ZkxVbQQ2AM943PZ8oHZo/K/p3k3P2XrbKZab6b4cek1Vzel6gfcDvwf8dKRtLtcL3cwPn01yY7rpgGAXqnnI9Blz1Wym9thdzWZ6k13u3yfJU4GPAW+pqu+3U6sTu05o263qre67PUck2Q/4RJLnTNN9t643yW8C91fVjUmOHTJkQttuU++IF1bVPenmlbsmydem6bvDa96Tjxzm4jQdWzslyfq2PN7+mDFJ5gP7At993PZ8BkmeSBcMf1lVH2/Nc7beTarqn+hmLD6euVvvC4HfSvJN4FLguCR/wdytF4Cquqfd3w98gm726V2m5j05HIZMC7K72aopSdph6w+SHN0+4XDK2JhN23o18LlqJy93tLZv/w/4alW9d2TVXK13QTtiIMmTgF8DvsYcrbeqzqqqQ6pqId3/w89V1euZo/UCJHlKkqdtWgZeBtzGrlTzzrwgs7NvdFN+/CPdlf8/2Nn7s5X7/lfAvcBP6N4hnEZ3PnElcEe733+k/x+0OtfSPs3Q2qfaD+XXgQ+y+Vvz+wB/TTclypeBZ+3EWl9Edzi8Gri53U6Yw/U+F7ip1Xsb8D9a+5ysd6z2Y9l8QXrO1kv3Kclb2m3Npt8/u1LNTp8hSerZk08rSZK2wHCQJPUYDpKkHsNBktRjOEiSegyHPUiSR9oMkLcl+eSmz9Jvx+1/JMmrZ+hz1WyfN8kbknxwrO3aJFNt+b+Nrftiu1+YNnttkqkk58ziuRcmed3I41ltZ2T8IUmuaLNvfj3Jn7Xv22xa/1dJVid562iN4/XsTsZeh2PTZl8d6/PnSRbv+L3TOMNhz/LDqjqiqp5D903JN+7oHaiqE6r71u/j4THhUFUvmPD8q6rqTbPY9kLg0XDYhu1s+lLfx4G/rapFwC8ATwX+uK3/WeAFVfXcqnrfbJ5jR2rfvt0uquo/VNXt22t7mj3DYc/1JdoEXUl+Psln2gRg/5DkF5Psm26++Se0Pk9OcneSJyY5Isl17Z3tJzIy53zruyTd3wHf9PjYJJ9sy99MckB7F/nVJP833d8s+Gz7NjBJfqVt+0tJ3jPkXXKSs4EntSOjv2xtD07o9+g71nYUc3O7bUiyrO3XPyT5SrttCpizgRe3vm8d287+Sf627fN1SZ7b2t+R7u9uXJvkziSbwuQ44EdV9WF4dB6ltwL/PsmTgc8Cz2zP9eIZ6n5Dko+31++OJO9u7fPakdxt6eb6f+uWXuvWfmB7LW9ptxeMH6EkeVuSd7Tla5P8SZLPA29O9zcFPt+2e3U2TwHxvLa9LzHgzUgeeyT4YJI/buOvS3Jga1+Q5GNJbmi3F7b2l4y8njelfQNZs7SzvxHpbcfdgAfb/Ty6b04e3x6vBBa15efTfc0euq/h/2pbfi3w5215NfCStvyHwPvb8kfovqY/H7gLeEprPw94fVv+JnAA3TvxjcARrf3ykT630b1zhu6X8m1t+Q3AB8dquhaYGq1vQr0LR7ZxLO0buCP9ntdq2pduiux9WvsiYNWkcTz2m7wfAP5nWz4OuLktvwP4It0c/AcA36H7+wxvAt434fW5ie7b0Y/u73iNE+p5A3Bn2/d9gG/RzafzPLrZXDeN2W+G1/oyugkNofv52HfCfrwNeMfIPn2oLT+x1blg5Gflwgk/K++Z7nWY8HoW8Iq2/G7gv7flS4AXteXD6KZVAfgk3WR20B2Jzd/Z/+d259uePCvrnuhJ6aaBXgjcSDcT5FOBFwB/nc2znO7d7i+j+4/+d3Rz3nwoyb50v2g+3/pcRBc0j6qqjUk+A7wiyd8Av0E3HfO4b1TVzW35RmBhuusRT6uqL7b2S4Df3LTpLdQ166/5JzkA+Cjwmqra0Or7YJIjgEfoTvnM5EXAqwCq6nNJntG2A3BlVf0Y+HGS+4ED6WbLnLTPW2qfqW1lVW1o9dwO/BzdlAzPSvIB4Eq6qaGne62Po5uXh+qOZDaMHxFOcFm7/1fAc+h+nqALl3sn/Kx8FFgywzZHPQxsui5xI93f8IBurqnFIzX8TDtK+ALw3nbk+PGqGp3KWlvJcNiz/LCqjmj/aT9Fd5j/EeCfquqICf1XAP8ryf5070Q/R/eObIjL2va/C9xQVT+Y0OfHI8uPAE9i8jTDm3wHGP+FtT/dHzHZaknm0c0C+odVten0yVuB+4Bfpjvt+qMhm5rQtumX93iN8+l+cb9qbF9+hu4d/9fZPIf/JuN1j9fce46q+l6SXwZeTvc6vAZ4C1t+rSfZyGNPPe8ztv6fN+0+sKaqjhmraT+2bVrsn1Q7DGDzvx1tn46pqh+O9T87yZV0825dl+TXqmq6abA1Da857IHau8w30Z0m+CHwjSQnQnextP1SoaoepJuw68/oTgE80sZ+b+Rc+L8DPj/+HHSnB44E/iOb32EO2bfv0WaZbE0njay+AXhhugu2tHPTe7P5j6D8JN3U3kOdDayuqktH2vYF7q2qn9LVNq+1/4DuT5RO8vfA77R9Ohb4dlV9f5rnXQk8Ockpbcw84H8DH6mqhyb0vxZ4fTa/VV5GdzS3Re2I6AlV9THg7XR/ZvX7bOG1bvv0nzbtTwur++iufTwjyd5sPoIbtxZYkOSYNv6JSZ5d3QcPNiR5Uev3O9Pt81b4LHDmSK1HtPufr6pbq+pdwCrgF7fT8+2RDIc9VFXdRDcj5El0/2lPS7JphsilI10vA17PY3/BLwPek2Q1cATddYfx7T9Cd3SyhM2nBoY6DbigXcQM3V+woqruA94MXNVOj70fOLn9Ige4AFjdTisM8TbgZSMXMX8L+BCwLMl1dKeUNr07Xg1sbBdH3zq2nXcAU+3f42w2T5M8UXs3/NvAiUnuoJsZ+EeMfdpqxAV04XRLe42eCvzpDLUdDFzb/p0+QvdnJmHLr/WbgV9NcivdKZxnV9VP6F7b6+lew4nvwqvqYbprTe9q272Z7vQVwKnAue21HH+n/9Ik60duxzDMm2j/3u002hmt/S3pLsDf0p7r0wO3pwmclVW7nCRPbUctJFlO9zd137yTd0vao3jNQbui30hyFt3P57foPpEjaQfyyEGS1OM1B0lSj+EgSeoxHCRJPYaDJKnHcJAk9fx/MgGojCeSEx8AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#用直方图画取值的分布\n",
    "sns.distplot(df_train['RevolvingUtilizationOfUnsecuredLines'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:05.191309Z",
     "start_time": "2020-12-13T06:57:04.967297Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6762.092451637375\n",
      "5437.0\n",
      "3008750.0\n",
      "1.0\n"
     ]
    }
   ],
   "source": [
    "#MonthlyIncome:月收入  容易受大值和小值影响填充缺失值,用中位数填充\n",
    "print(df_train[df_train['MonthlyIncome']!=0]['MonthlyIncome'].mean())  #均值\n",
    "print(df_train[df_train['MonthlyIncome']!=0]['MonthlyIncome'].median())  #中位数\n",
    "print(df_train[df_train['MonthlyIncome']!=0]['MonthlyIncome'].max()) #最大值\n",
    "print(df_train[df_train['MonthlyIncome']!=0]['MonthlyIncome'].min()) #最小值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:05.249313Z",
     "start_time": "2020-12-13T06:57:05.195310Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.7572222678605657\n",
      "0.0\n",
      "20.0\n",
      "0.0\n",
      "count    146076.000000\n",
      "mean          0.757222\n",
      "std           1.115086\n",
      "min           0.000000\n",
      "25%           0.000000\n",
      "50%           0.000000\n",
      "75%           1.000000\n",
      "max          20.000000\n",
      "Name: NumberOfDependents, dtype: float64\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0.0     86902\n",
       "1.0     26316\n",
       "2.0     19522\n",
       "3.0      9483\n",
       "4.0      2862\n",
       "5.0       746\n",
       "6.0       158\n",
       "7.0        51\n",
       "8.0        24\n",
       "9.0         5\n",
       "10.0        5\n",
       "13.0        1\n",
       "20.0        1\n",
       "Name: NumberOfDependents, dtype: int64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#NumberOfDependents:除自己(配偶、子女等)以外的家庭受养人人数， 用中位数填充\n",
    "print(df_train['NumberOfDependents'].mean())  #均值\n",
    "print(df_train['NumberOfDependents'].median())  #中位数\n",
    "print(df_train['NumberOfDependents'].max())  #最大值\n",
    "print(df_train['NumberOfDependents'].min())  #最小值\n",
    "print(df_train.NumberOfDependents.describe())\n",
    "df_train.NumberOfDependents.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:05.391321Z",
     "start_time": "2020-12-13T06:57:05.254313Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 150000 entries, 0 to 149999\n",
      "Data columns (total 11 columns):\n",
      " #   Column                                Non-Null Count   Dtype  \n",
      "---  ------                                --------------   -----  \n",
      " 0   SeriousDlqin2yrs                      150000 non-null  int64  \n",
      " 1   RevolvingUtilizationOfUnsecuredLines  150000 non-null  float64\n",
      " 2   age                                   150000 non-null  int64  \n",
      " 3   NumberOfTime30-59DaysPastDueNotWorse  150000 non-null  int64  \n",
      " 4   DebtRatio                             150000 non-null  float64\n",
      " 5   MonthlyIncome                         150000 non-null  float64\n",
      " 6   NumberOfOpenCreditLinesAndLoans       150000 non-null  int64  \n",
      " 7   NumberOfTimes90DaysLate               150000 non-null  int64  \n",
      " 8   NumberRealEstateLoansOrLines          150000 non-null  int64  \n",
      " 9   NumberOfTime60-89DaysPastDueNotWorse  150000 non-null  int64  \n",
      " 10  NumberOfDependents                    150000 non-null  float64\n",
      "dtypes: float64(4), int64(7)\n",
      "memory usage: 12.6 MB\n"
     ]
    }
   ],
   "source": [
    "#用中位数填充\n",
    "df_train = df_train.fillna(df_train.median())\n",
    "df_train.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据分箱"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:05.465325Z",
     "start_time": "2020-12-13T06:57:05.397321Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>bin_age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>45</td>\n",
       "      <td>(40.0, 50.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>40</td>\n",
       "      <td>(25.0, 40.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>38</td>\n",
       "      <td>(25.0, 40.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>30</td>\n",
       "      <td>(25.0, 40.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>49</td>\n",
       "      <td>(40.0, 50.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149995</th>\n",
       "      <td>74</td>\n",
       "      <td>(70.0, inf]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149996</th>\n",
       "      <td>44</td>\n",
       "      <td>(40.0, 50.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149997</th>\n",
       "      <td>58</td>\n",
       "      <td>(50.0, 60.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149998</th>\n",
       "      <td>30</td>\n",
       "      <td>(25.0, 40.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149999</th>\n",
       "      <td>64</td>\n",
       "      <td>(60.0, 70.0]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>150000 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        age       bin_age\n",
       "0        45  (40.0, 50.0]\n",
       "1        40  (25.0, 40.0]\n",
       "2        38  (25.0, 40.0]\n",
       "3        30  (25.0, 40.0]\n",
       "4        49  (40.0, 50.0]\n",
       "...     ...           ...\n",
       "149995   74   (70.0, inf]\n",
       "149996   44  (40.0, 50.0]\n",
       "149997   58  (50.0, 60.0]\n",
       "149998   30  (25.0, 40.0]\n",
       "149999   64  (60.0, 70.0]\n",
       "\n",
       "[150000 rows x 2 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#对age字段进行分箱\n",
    "import math\n",
    "\n",
    "age_bins = [-math.inf,25, 40,50,60,70, math.inf]\n",
    "df_train['bin_age'] = pd.cut(df_train['age'], bins=age_bins)\n",
    "df_train[['age','bin_age']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:05.539329Z",
     "start_time": "2020-12-13T06:57:05.470325Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>NumberOfDependents</th>\n",
       "      <th>bin_NumberOfDependents</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149995</th>\n",
       "      <td>0.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149996</th>\n",
       "      <td>2.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149997</th>\n",
       "      <td>0.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149998</th>\n",
       "      <td>0.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149999</th>\n",
       "      <td>0.0</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>150000 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        NumberOfDependents bin_NumberOfDependents\n",
       "0                      2.0            (-inf, 2.0]\n",
       "1                      1.0            (-inf, 2.0]\n",
       "2                      0.0            (-inf, 2.0]\n",
       "3                      0.0            (-inf, 2.0]\n",
       "4                      0.0            (-inf, 2.0]\n",
       "...                    ...                    ...\n",
       "149995                 0.0            (-inf, 2.0]\n",
       "149996                 2.0            (-inf, 2.0]\n",
       "149997                 0.0            (-inf, 2.0]\n",
       "149998                 0.0            (-inf, 2.0]\n",
       "149999                 0.0            (-inf, 2.0]\n",
       "\n",
       "[150000 rows x 2 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#NumberOfDependents:除自己(配偶、子女等)以外的家庭受养人人数   分箱\n",
    "dependent_bins = [-math.inf, 2, 4, 6, 8, 10, math.inf]\n",
    "df_train['bin_NumberOfDependents'] = pd.cut(df_train['NumberOfDependents'], bins= dependent_bins)\n",
    "df_train[['NumberOfDependents','bin_NumberOfDependents']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:05.691338Z",
     "start_time": "2020-12-13T06:57:05.559330Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>bin_NumberOfTime30-59DaysPastDueNotWorse</th>\n",
       "      <th>bin_NumberOfTime60-89DaysPastDueNotWorse</th>\n",
       "      <th>bin_NumberOfTimes90DaysLate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>(1.0, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149995</th>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149996</th>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149997</th>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149998</th>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149999</th>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>150000 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       bin_NumberOfTime30-59DaysPastDueNotWorse  \\\n",
       "0                                    (1.0, 2.0]   \n",
       "1                                   (-inf, 1.0]   \n",
       "2                                   (-inf, 1.0]   \n",
       "3                                   (-inf, 1.0]   \n",
       "4                                   (-inf, 1.0]   \n",
       "...                                         ...   \n",
       "149995                              (-inf, 1.0]   \n",
       "149996                              (-inf, 1.0]   \n",
       "149997                              (-inf, 1.0]   \n",
       "149998                              (-inf, 1.0]   \n",
       "149999                              (-inf, 1.0]   \n",
       "\n",
       "       bin_NumberOfTime60-89DaysPastDueNotWorse bin_NumberOfTimes90DaysLate  \n",
       "0                                   (-inf, 1.0]                 (-inf, 1.0]  \n",
       "1                                   (-inf, 1.0]                 (-inf, 1.0]  \n",
       "2                                   (-inf, 1.0]                 (-inf, 1.0]  \n",
       "3                                   (-inf, 1.0]                 (-inf, 1.0]  \n",
       "4                                   (-inf, 1.0]                 (-inf, 1.0]  \n",
       "...                                         ...                         ...  \n",
       "149995                              (-inf, 1.0]                 (-inf, 1.0]  \n",
       "149996                              (-inf, 1.0]                 (-inf, 1.0]  \n",
       "149997                              (-inf, 1.0]                 (-inf, 1.0]  \n",
       "149998                              (-inf, 1.0]                 (-inf, 1.0]  \n",
       "149999                              (-inf, 1.0]                 (-inf, 1.0]  \n",
       "\n",
       "[150000 rows x 3 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#对于3种逾期次数NumberOfTime30-59DaysPastDueNotWorse，NumberOfTime60-89DaysPastDueNotWorse， NumberOfTimes90DaysLate 分箱\n",
    "dpd_bins = [-math.inf, 1, 2, 3, 4, 5, 6, 7, 8, 9, math.inf]\n",
    "df_train['bin_NumberOfTime30-59DaysPastDueNotWorse']= pd.cut(df_train['NumberOfTime30-59DaysPastDueNotWorse'], bins=dpd_bins)\n",
    "df_train['bin_NumberOfTime60-89DaysPastDueNotWorse']= pd.cut(df_train['NumberOfTime60-89DaysPastDueNotWorse'], bins=dpd_bins)\n",
    "df_train['bin_NumberOfTimes90DaysLate']= pd.cut(df_train['NumberOfTimes90DaysLate'], bins=dpd_bins)\n",
    "#查看分箱情况\n",
    "df_train[['bin_NumberOfTime30-59DaysPastDueNotWorse','bin_NumberOfTime60-89DaysPastDueNotWorse','bin_NumberOfTimes90DaysLate']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:06.120363Z",
     "start_time": "2020-12-13T06:57:05.695338Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SeriousDlqin2yrs</th>\n",
       "      <th>RevolvingUtilizationOfUnsecuredLines</th>\n",
       "      <th>age</th>\n",
       "      <th>NumberOfTime30-59DaysPastDueNotWorse</th>\n",
       "      <th>DebtRatio</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "      <th>NumberOfOpenCreditLinesAndLoans</th>\n",
       "      <th>NumberOfTimes90DaysLate</th>\n",
       "      <th>NumberRealEstateLoansOrLines</th>\n",
       "      <th>NumberOfTime60-89DaysPastDueNotWorse</th>\n",
       "      <th>...</th>\n",
       "      <th>bin_age</th>\n",
       "      <th>bin_NumberOfDependents</th>\n",
       "      <th>bin_NumberOfTime30-59DaysPastDueNotWorse</th>\n",
       "      <th>bin_NumberOfTime60-89DaysPastDueNotWorse</th>\n",
       "      <th>bin_NumberOfTimes90DaysLate</th>\n",
       "      <th>bin_RevolvingUtilizationOfUnsecuredLines</th>\n",
       "      <th>bin_DebtRatio</th>\n",
       "      <th>bin_MonthlyIncome</th>\n",
       "      <th>bin_NumberOfOpenCreditLinesAndLoans</th>\n",
       "      <th>bin_NumberRealEstateLoansOrLines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.766127</td>\n",
       "      <td>45</td>\n",
       "      <td>2</td>\n",
       "      <td>0.802982</td>\n",
       "      <td>9120.0</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(40.0, 50.0]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(1.0, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(0.699, 50708.0]</td>\n",
       "      <td>(0.468, 4.0]</td>\n",
       "      <td>(8250.0, 3008750.0]</td>\n",
       "      <td>(12.0, 58.0]</td>\n",
       "      <td>(2.0, 54.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0.957151</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>0.121876</td>\n",
       "      <td>2600.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(25.0, 40.0]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(0.699, 50708.0]</td>\n",
       "      <td>(-0.001, 0.134]</td>\n",
       "      <td>(-0.001, 3400.0]</td>\n",
       "      <td>(-0.001, 4.0]</td>\n",
       "      <td>(-0.001, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0.658180</td>\n",
       "      <td>38</td>\n",
       "      <td>1</td>\n",
       "      <td>0.085113</td>\n",
       "      <td>3042.0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(25.0, 40.0]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(0.271, 0.699]</td>\n",
       "      <td>(-0.001, 0.134]</td>\n",
       "      <td>(-0.001, 3400.0]</td>\n",
       "      <td>(-0.001, 4.0]</td>\n",
       "      <td>(-0.001, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0.233810</td>\n",
       "      <td>30</td>\n",
       "      <td>0</td>\n",
       "      <td>0.036050</td>\n",
       "      <td>3300.0</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(25.0, 40.0]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(0.0832, 0.271]</td>\n",
       "      <td>(-0.001, 0.134]</td>\n",
       "      <td>(-0.001, 3400.0]</td>\n",
       "      <td>(4.0, 6.0]</td>\n",
       "      <td>(-0.001, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0.907239</td>\n",
       "      <td>49</td>\n",
       "      <td>1</td>\n",
       "      <td>0.024926</td>\n",
       "      <td>63588.0</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(40.0, 50.0]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(0.699, 50708.0]</td>\n",
       "      <td>(-0.001, 0.134]</td>\n",
       "      <td>(8250.0, 3008750.0]</td>\n",
       "      <td>(6.0, 9.0]</td>\n",
       "      <td>(-0.001, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149995</th>\n",
       "      <td>0</td>\n",
       "      <td>0.040674</td>\n",
       "      <td>74</td>\n",
       "      <td>0</td>\n",
       "      <td>0.225131</td>\n",
       "      <td>2100.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(70.0, inf]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(0.0192, 0.0832]</td>\n",
       "      <td>(0.134, 0.287]</td>\n",
       "      <td>(-0.001, 3400.0]</td>\n",
       "      <td>(-0.001, 4.0]</td>\n",
       "      <td>(-0.001, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149996</th>\n",
       "      <td>0</td>\n",
       "      <td>0.299745</td>\n",
       "      <td>44</td>\n",
       "      <td>0</td>\n",
       "      <td>0.716562</td>\n",
       "      <td>5584.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(40.0, 50.0]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(0.271, 0.699]</td>\n",
       "      <td>(0.468, 4.0]</td>\n",
       "      <td>(5400.0, 8250.0]</td>\n",
       "      <td>(-0.001, 4.0]</td>\n",
       "      <td>(-0.001, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149997</th>\n",
       "      <td>0</td>\n",
       "      <td>0.246044</td>\n",
       "      <td>58</td>\n",
       "      <td>0</td>\n",
       "      <td>3870.000000</td>\n",
       "      <td>5400.0</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(50.0, 60.0]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(0.0832, 0.271]</td>\n",
       "      <td>(4.0, 329664.0]</td>\n",
       "      <td>(3400.0, 5400.0]</td>\n",
       "      <td>(12.0, 58.0]</td>\n",
       "      <td>(-0.001, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149998</th>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>30</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>5716.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(25.0, 40.0]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-0.001, 0.0192]</td>\n",
       "      <td>(-0.001, 0.134]</td>\n",
       "      <td>(5400.0, 8250.0]</td>\n",
       "      <td>(-0.001, 4.0]</td>\n",
       "      <td>(-0.001, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149999</th>\n",
       "      <td>0</td>\n",
       "      <td>0.850283</td>\n",
       "      <td>64</td>\n",
       "      <td>0</td>\n",
       "      <td>0.249908</td>\n",
       "      <td>8158.0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>(60.0, 70.0]</td>\n",
       "      <td>(-inf, 2.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>(0.699, 50708.0]</td>\n",
       "      <td>(0.134, 0.287]</td>\n",
       "      <td>(5400.0, 8250.0]</td>\n",
       "      <td>(6.0, 9.0]</td>\n",
       "      <td>(1.0, 2.0]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>150000 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        SeriousDlqin2yrs  RevolvingUtilizationOfUnsecuredLines  age  \\\n",
       "0                      1                              0.766127   45   \n",
       "1                      0                              0.957151   40   \n",
       "2                      0                              0.658180   38   \n",
       "3                      0                              0.233810   30   \n",
       "4                      0                              0.907239   49   \n",
       "...                  ...                                   ...  ...   \n",
       "149995                 0                              0.040674   74   \n",
       "149996                 0                              0.299745   44   \n",
       "149997                 0                              0.246044   58   \n",
       "149998                 0                              0.000000   30   \n",
       "149999                 0                              0.850283   64   \n",
       "\n",
       "        NumberOfTime30-59DaysPastDueNotWorse    DebtRatio  MonthlyIncome  \\\n",
       "0                                          2     0.802982         9120.0   \n",
       "1                                          0     0.121876         2600.0   \n",
       "2                                          1     0.085113         3042.0   \n",
       "3                                          0     0.036050         3300.0   \n",
       "4                                          1     0.024926        63588.0   \n",
       "...                                      ...          ...            ...   \n",
       "149995                                     0     0.225131         2100.0   \n",
       "149996                                     0     0.716562         5584.0   \n",
       "149997                                     0  3870.000000         5400.0   \n",
       "149998                                     0     0.000000         5716.0   \n",
       "149999                                     0     0.249908         8158.0   \n",
       "\n",
       "        NumberOfOpenCreditLinesAndLoans  NumberOfTimes90DaysLate  \\\n",
       "0                                    13                        0   \n",
       "1                                     4                        0   \n",
       "2                                     2                        1   \n",
       "3                                     5                        0   \n",
       "4                                     7                        0   \n",
       "...                                 ...                      ...   \n",
       "149995                                4                        0   \n",
       "149996                                4                        0   \n",
       "149997                               18                        0   \n",
       "149998                                4                        0   \n",
       "149999                                8                        0   \n",
       "\n",
       "        NumberRealEstateLoansOrLines  NumberOfTime60-89DaysPastDueNotWorse  \\\n",
       "0                                  6                                     0   \n",
       "1                                  0                                     0   \n",
       "2                                  0                                     0   \n",
       "3                                  0                                     0   \n",
       "4                                  1                                     0   \n",
       "...                              ...                                   ...   \n",
       "149995                             1                                     0   \n",
       "149996                             1                                     0   \n",
       "149997                             1                                     0   \n",
       "149998                             0                                     0   \n",
       "149999                             2                                     0   \n",
       "\n",
       "        ...       bin_age bin_NumberOfDependents  \\\n",
       "0       ...  (40.0, 50.0]            (-inf, 2.0]   \n",
       "1       ...  (25.0, 40.0]            (-inf, 2.0]   \n",
       "2       ...  (25.0, 40.0]            (-inf, 2.0]   \n",
       "3       ...  (25.0, 40.0]            (-inf, 2.0]   \n",
       "4       ...  (40.0, 50.0]            (-inf, 2.0]   \n",
       "...     ...           ...                    ...   \n",
       "149995  ...   (70.0, inf]            (-inf, 2.0]   \n",
       "149996  ...  (40.0, 50.0]            (-inf, 2.0]   \n",
       "149997  ...  (50.0, 60.0]            (-inf, 2.0]   \n",
       "149998  ...  (25.0, 40.0]            (-inf, 2.0]   \n",
       "149999  ...  (60.0, 70.0]            (-inf, 2.0]   \n",
       "\n",
       "       bin_NumberOfTime30-59DaysPastDueNotWorse  \\\n",
       "0                                    (1.0, 2.0]   \n",
       "1                                   (-inf, 1.0]   \n",
       "2                                   (-inf, 1.0]   \n",
       "3                                   (-inf, 1.0]   \n",
       "4                                   (-inf, 1.0]   \n",
       "...                                         ...   \n",
       "149995                              (-inf, 1.0]   \n",
       "149996                              (-inf, 1.0]   \n",
       "149997                              (-inf, 1.0]   \n",
       "149998                              (-inf, 1.0]   \n",
       "149999                              (-inf, 1.0]   \n",
       "\n",
       "       bin_NumberOfTime60-89DaysPastDueNotWorse bin_NumberOfTimes90DaysLate  \\\n",
       "0                                   (-inf, 1.0]                 (-inf, 1.0]   \n",
       "1                                   (-inf, 1.0]                 (-inf, 1.0]   \n",
       "2                                   (-inf, 1.0]                 (-inf, 1.0]   \n",
       "3                                   (-inf, 1.0]                 (-inf, 1.0]   \n",
       "4                                   (-inf, 1.0]                 (-inf, 1.0]   \n",
       "...                                         ...                         ...   \n",
       "149995                              (-inf, 1.0]                 (-inf, 1.0]   \n",
       "149996                              (-inf, 1.0]                 (-inf, 1.0]   \n",
       "149997                              (-inf, 1.0]                 (-inf, 1.0]   \n",
       "149998                              (-inf, 1.0]                 (-inf, 1.0]   \n",
       "149999                              (-inf, 1.0]                 (-inf, 1.0]   \n",
       "\n",
       "       bin_RevolvingUtilizationOfUnsecuredLines    bin_DebtRatio  \\\n",
       "0                              (0.699, 50708.0]     (0.468, 4.0]   \n",
       "1                              (0.699, 50708.0]  (-0.001, 0.134]   \n",
       "2                                (0.271, 0.699]  (-0.001, 0.134]   \n",
       "3                               (0.0832, 0.271]  (-0.001, 0.134]   \n",
       "4                              (0.699, 50708.0]  (-0.001, 0.134]   \n",
       "...                                         ...              ...   \n",
       "149995                         (0.0192, 0.0832]   (0.134, 0.287]   \n",
       "149996                           (0.271, 0.699]     (0.468, 4.0]   \n",
       "149997                          (0.0832, 0.271]  (4.0, 329664.0]   \n",
       "149998                         (-0.001, 0.0192]  (-0.001, 0.134]   \n",
       "149999                         (0.699, 50708.0]   (0.134, 0.287]   \n",
       "\n",
       "          bin_MonthlyIncome bin_NumberOfOpenCreditLinesAndLoans  \\\n",
       "0       (8250.0, 3008750.0]                        (12.0, 58.0]   \n",
       "1          (-0.001, 3400.0]                       (-0.001, 4.0]   \n",
       "2          (-0.001, 3400.0]                       (-0.001, 4.0]   \n",
       "3          (-0.001, 3400.0]                          (4.0, 6.0]   \n",
       "4       (8250.0, 3008750.0]                          (6.0, 9.0]   \n",
       "...                     ...                                 ...   \n",
       "149995     (-0.001, 3400.0]                       (-0.001, 4.0]   \n",
       "149996     (5400.0, 8250.0]                       (-0.001, 4.0]   \n",
       "149997     (3400.0, 5400.0]                        (12.0, 58.0]   \n",
       "149998     (5400.0, 8250.0]                       (-0.001, 4.0]   \n",
       "149999     (5400.0, 8250.0]                          (6.0, 9.0]   \n",
       "\n",
       "       bin_NumberRealEstateLoansOrLines  \n",
       "0                           (2.0, 54.0]  \n",
       "1                         (-0.001, 1.0]  \n",
       "2                         (-0.001, 1.0]  \n",
       "3                         (-0.001, 1.0]  \n",
       "4                         (-0.001, 1.0]  \n",
       "...                                 ...  \n",
       "149995                    (-0.001, 1.0]  \n",
       "149996                    (-0.001, 1.0]  \n",
       "149997                    (-0.001, 1.0]  \n",
       "149998                    (-0.001, 1.0]  \n",
       "149999                       (1.0, 2.0]  \n",
       "\n",
       "[150000 rows x 21 columns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#对其余字段也做分箱处理\n",
    "#RevolvingUtilizationOfUnsecuredLines,DebtRatio,,MonthlyIncome,NumberOfOpenCreditLinesAndLoans,NumberRealEstateLoansOrLines\n",
    "#RevolvingUtilizationOfUnsecuredLines\t除房地产和汽车贷款等无分期付款债务外，信用卡和个人信用额度的总余额除以信贷限额\n",
    "df_train['bin_RevolvingUtilizationOfUnsecuredLines'] = pd.qcut(df_train['RevolvingUtilizationOfUnsecuredLines'], q=5, duplicates='drop')\n",
    "\n",
    "#DebtRatio\t债务比（每月偿还的债务，赡养费，生活费除以每月的总收入）\n",
    "df_train['bin_DebtRatio'] = pd.qcut(df_train['DebtRatio'], q=5, duplicates='drop')\n",
    "\n",
    "#MonthlyIncome\t每月收入\n",
    "df_train['bin_MonthlyIncome'] = pd.qcut(df_train['MonthlyIncome'], q=5, duplicates='drop')\n",
    "\n",
    "#NumberOfOpenCreditLinesAndLoans\t公开贷款(如汽车贷款或抵押贷款)和信用额度(如信用卡)的数量\n",
    "df_train['bin_NumberOfOpenCreditLinesAndLoans'] = pd.qcut(df_train['NumberOfOpenCreditLinesAndLoans'], q=5, duplicates='drop')\n",
    "\n",
    "#NumberRealEstateLoansOrLines\t抵押贷款和房地产贷款的额度（包括房屋净值信贷）\n",
    "df_train['bin_NumberRealEstateLoansOrLines'] = pd.qcut(df_train['NumberRealEstateLoansOrLines'], q=5, duplicates='drop')\n",
    "\n",
    "df_train\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:06.142364Z",
     "start_time": "2020-12-13T06:57:06.124363Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.699, 50708.0]    30000\n",
       "(0.271, 0.699]      30000\n",
       "(0.0832, 0.271]     30000\n",
       "(0.0192, 0.0832]    30000\n",
       "(-0.001, 0.0192]    30000\n",
       "Name: bin_RevolvingUtilizationOfUnsecuredLines, dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['bin_RevolvingUtilizationOfUnsecuredLines'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:06.162365Z",
     "start_time": "2020-12-13T06:57:06.145364Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.468, 4.0]       30109\n",
       "(0.287, 0.468]     30000\n",
       "(0.134, 0.287]     30000\n",
       "(-0.001, 0.134]    30000\n",
       "(4.0, 329664.0]    29891\n",
       "Name: bin_DebtRatio, dtype: int64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['bin_DebtRatio'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:06.182366Z",
     "start_time": "2020-12-13T06:57:06.165365Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3400.0, 5400.0]       59757\n",
       "(-0.001, 3400.0]       30289\n",
       "(8250.0, 3008750.0]    29993\n",
       "(5400.0, 8250.0]       29961\n",
       "Name: bin_MonthlyIncome, dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#分箱合并了\n",
    "df_train['bin_MonthlyIncome'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:06.217368Z",
     "start_time": "2020-12-13T06:57:06.185366Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5400.0     30062\n",
       "5000.0      2757\n",
       "4000.0      2106\n",
       "6000.0      1934\n",
       "3000.0      1758\n",
       "           ...  \n",
       "3847.0         1\n",
       "10113.0        1\n",
       "14210.0        1\n",
       "13023.0        1\n",
       "1037.0         1\n",
       "Name: MonthlyIncome, Length: 13594, dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['MonthlyIncome'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:06.253370Z",
     "start_time": "2020-12-13T06:57:06.224369Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(-0.001, 1.0]    108526\n",
       "(1.0, 2.0]        31522\n",
       "(2.0, 54.0]        9952\n",
       "Name: bin_NumberRealEstateLoansOrLines, dtype: int64"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#分箱合并了\n",
    "df_train['bin_NumberRealEstateLoansOrLines'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:06.270371Z",
     "start_time": "2020-12-13T06:57:06.256370Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     56188\n",
       "1     52338\n",
       "2     31522\n",
       "3      6300\n",
       "4      2170\n",
       "5       689\n",
       "6       320\n",
       "7       171\n",
       "8        93\n",
       "9        78\n",
       "10       37\n",
       "11       23\n",
       "12       18\n",
       "13       15\n",
       "14        7\n",
       "15        7\n",
       "16        4\n",
       "17        4\n",
       "25        3\n",
       "18        2\n",
       "19        2\n",
       "20        2\n",
       "23        2\n",
       "32        1\n",
       "21        1\n",
       "26        1\n",
       "29        1\n",
       "54        1\n",
       "Name: NumberRealEstateLoansOrLines, dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['NumberRealEstateLoansOrLines'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:06.281372Z",
     "start_time": "2020-12-13T06:57:06.274371Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['bin_age',\n",
       " 'bin_NumberOfDependents',\n",
       " 'bin_NumberOfTime30-59DaysPastDueNotWorse',\n",
       " 'bin_NumberOfTime60-89DaysPastDueNotWorse',\n",
       " 'bin_NumberOfTimes90DaysLate',\n",
       " 'bin_RevolvingUtilizationOfUnsecuredLines',\n",
       " 'bin_DebtRatio',\n",
       " 'bin_MonthlyIncome',\n",
       " 'bin_NumberOfOpenCreditLinesAndLoans',\n",
       " 'bin_NumberRealEstateLoansOrLines']"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#统计分箱字段\n",
    "bin_cols = [c for c in df_train.columns.values if c.startswith('bin_')]\n",
    "bin_cols"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:06.296373Z",
     "start_time": "2020-12-13T06:57:06.285372Z"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "#计算IV，衡量变量的区分能力\n",
    "def cal_IV(df, feature, target):\n",
    "    lst =[]\n",
    "\n",
    "    cols =[\"Variable\", 'Value', \"All\", 'Bad']\n",
    "    #对feature字段中的每个分箱取值进行遍历\n",
    "    for i in range(df[feature].nunique()):\n",
    "        #得到feature字段的第i个分箱取值val\n",
    "        val = list(df[feature].unique())[i]\n",
    "        \n",
    "        temp = df[df[feature]==val][target]\n",
    "        #统计feature，feature_value,这个value的个数，这个value导致target=1的个数\n",
    "        lst.append([feature, val, len(temp),temp.sum()])\n",
    "        \n",
    "    #print(lst)\n",
    "    data = pd.DataFrame(lst, columns=cols)   \n",
    "    data = data[data['Bad'] > 0]\n",
    "    \n",
    "    data['Share'] = data['All'] / data['All'].sum()   #这个value所占比例\n",
    "    data['Bad Rate'] = data['Bad'] / data['All'] #这个value导致bad情况，在该value的个数比例\n",
    "    data['Margin Bad'] = data['Bad'] / data['Bad'].sum()\n",
    "    data['Margin Good'] = (data['All'] - data['Bad']) / (data['All'].sum() - data['Bad'].sum())\n",
    "    data['woe'] = np.log(data['Margin Bad'] / data['Margin Good'])\n",
    "    data['iv'] = (data['woe'] * (data['Margin Bad'] - data['Margin Good'])).sum()\n",
    "    data = data.sort_values(by=(['Variable','Value']))\n",
    "    #print(data)\n",
    "    return data['iv'].values[0]\n",
    "#cal_IV(df_train, 'bin_age', 'SeriousDlqin2yrs') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:07.416437Z",
     "start_time": "2020-12-13T06:57:06.299373Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feature</th>\n",
       "      <th>iv</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bin_RevolvingUtilizationOfUnsecuredLines</td>\n",
       "      <td>1.059619</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bin_NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>0.492445</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>bin_NumberOfTimes90DaysLate</td>\n",
       "      <td>0.491607</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>bin_NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>0.266559</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bin_age</td>\n",
       "      <td>0.240411</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>bin_DebtRatio</td>\n",
       "      <td>0.059488</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>bin_MonthlyIncome</td>\n",
       "      <td>0.056234</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>bin_NumberOfOpenCreditLinesAndLoans</td>\n",
       "      <td>0.048023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>bin_NumberOfDependents</td>\n",
       "      <td>0.014508</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>bin_NumberRealEstateLoansOrLines</td>\n",
       "      <td>0.012091</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    feature        iv\n",
       "5  bin_RevolvingUtilizationOfUnsecuredLines  1.059619\n",
       "2  bin_NumberOfTime30-59DaysPastDueNotWorse  0.492445\n",
       "4               bin_NumberOfTimes90DaysLate  0.491607\n",
       "3  bin_NumberOfTime60-89DaysPastDueNotWorse  0.266559\n",
       "0                                   bin_age  0.240411\n",
       "6                             bin_DebtRatio  0.059488\n",
       "7                         bin_MonthlyIncome  0.056234\n",
       "8       bin_NumberOfOpenCreditLinesAndLoans  0.048023\n",
       "1                    bin_NumberOfDependents  0.014508\n",
       "9          bin_NumberRealEstateLoansOrLines  0.012091"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#计算每个字段的iv值\n",
    "lst2=[]\n",
    "cols2=['feature','iv']\n",
    "for f in bin_cols:\n",
    "    lst2.append([f, cal_IV(df_train,f,'SeriousDlqin2yrs')])\n",
    "\n",
    "data2 = pd.DataFrame(lst2, columns=cols2)\n",
    "data2.sort_values(by=['iv'], ascending=False, inplace=True)\n",
    "data2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:07.434438Z",
     "start_time": "2020-12-13T06:57:07.419437Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feature</th>\n",
       "      <th>iv</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bin_RevolvingUtilizationOfUnsecuredLines</td>\n",
       "      <td>1.059619</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bin_NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>0.492445</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>bin_NumberOfTimes90DaysLate</td>\n",
       "      <td>0.491607</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>bin_NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>0.266559</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bin_age</td>\n",
       "      <td>0.240411</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    feature        iv\n",
       "5  bin_RevolvingUtilizationOfUnsecuredLines  1.059619\n",
       "2  bin_NumberOfTime30-59DaysPastDueNotWorse  0.492445\n",
       "4               bin_NumberOfTimes90DaysLate  0.491607\n",
       "3  bin_NumberOfTime60-89DaysPastDueNotWorse  0.266559\n",
       "0                                   bin_age  0.240411"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#只选择iv值大于0.1的feature\n",
    "result = data2[data2['iv']>0.1]\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:07.449439Z",
     "start_time": "2020-12-13T06:57:07.439438Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  \n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['RevolvingUtilizationOfUnsecuredLines',\n",
       " 'NumberOfTime30-59DaysPastDueNotWorse',\n",
       " 'NumberOfTimes90DaysLate',\n",
       " 'NumberOfTime60-89DaysPastDueNotWorse',\n",
       " 'age']"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#对应到原始数据中的iv值大于0.1的特征值\n",
    "result['org_feature'] = result['feature'].str.replace('bin_','')\n",
    "feature_cols = list(result['org_feature'].values)\n",
    "feature_cols"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T07:07:30.780230Z",
     "start_time": "2020-12-13T07:07:27.303031Z"
    }
   },
   "outputs": [],
   "source": [
    "#计算这些feature的woe\n",
    "def cal_WOE(df, features, target):\n",
    "    df_new = df.copy()\n",
    "    for f in features:\n",
    "        df_woe= df_new.groupby(f).agg({target:['sum','count']})\n",
    "        df_woe.columns=list(map(''.join, df_woe.columns.values))\n",
    "        df_woe.reset_index()\n",
    "        df_woe = df_woe.rename(columns={target+'sum':'bad', target+'count': 'all'})\n",
    "        #print(df_woe)\n",
    "        df_woe['good'] = df_woe['all'] - df_woe['bad']\n",
    "        df_woe['margin_bad'] = df_woe['bad']/ df_woe['bad'].sum()\n",
    "        df_woe['margin_good'] = df_woe['good']/ df_woe['good'].sum()\n",
    "        \n",
    "        #避免分母为0的情况，使用np.log1p\n",
    "        df_woe['woe'] = np.log1p(df_woe['margin_bad'] / df_woe['margin_good'])\n",
    "        \n",
    "        df_woe.columns= [c if c==f else c+'_'+f for c in list(df_woe.columns.values)]\n",
    "        \n",
    "        df_new = df_new.merge(df_woe, on=f, how='left')\n",
    "    return df_new\n",
    "df_woe = cal_WOE(df_train,bin_cols, 'SeriousDlqin2yrs')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T07:07:32.422324Z",
     "start_time": "2020-12-13T07:07:31.987299Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SeriousDlqin2yrs</th>\n",
       "      <th>RevolvingUtilizationOfUnsecuredLines</th>\n",
       "      <th>age</th>\n",
       "      <th>NumberOfTime30-59DaysPastDueNotWorse</th>\n",
       "      <th>DebtRatio</th>\n",
       "      <th>MonthlyIncome</th>\n",
       "      <th>NumberOfOpenCreditLinesAndLoans</th>\n",
       "      <th>NumberOfTimes90DaysLate</th>\n",
       "      <th>NumberRealEstateLoansOrLines</th>\n",
       "      <th>NumberOfTime60-89DaysPastDueNotWorse</th>\n",
       "      <th>...</th>\n",
       "      <th>good_bin_NumberOfOpenCreditLinesAndLoans</th>\n",
       "      <th>margin_bad_bin_NumberOfOpenCreditLinesAndLoans</th>\n",
       "      <th>margin_good_bin_NumberOfOpenCreditLinesAndLoans</th>\n",
       "      <th>woe_bin_NumberOfOpenCreditLinesAndLoans</th>\n",
       "      <th>bad_bin_NumberRealEstateLoansOrLines</th>\n",
       "      <th>all_bin_NumberRealEstateLoansOrLines</th>\n",
       "      <th>good_bin_NumberRealEstateLoansOrLines</th>\n",
       "      <th>margin_bad_bin_NumberRealEstateLoansOrLines</th>\n",
       "      <th>margin_good_bin_NumberRealEstateLoansOrLines</th>\n",
       "      <th>woe_bin_NumberRealEstateLoansOrLines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.766127</td>\n",
       "      <td>45</td>\n",
       "      <td>2</td>\n",
       "      <td>0.802982</td>\n",
       "      <td>9120.0</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>25838</td>\n",
       "      <td>0.184121</td>\n",
       "      <td>0.184591</td>\n",
       "      <td>0.691873</td>\n",
       "      <td>841</td>\n",
       "      <td>9952</td>\n",
       "      <td>9111</td>\n",
       "      <td>0.083882</td>\n",
       "      <td>0.065091</td>\n",
       "      <td>0.827981</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0.957151</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>0.121876</td>\n",
       "      <td>2600.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>30556</td>\n",
       "      <td>0.309495</td>\n",
       "      <td>0.218298</td>\n",
       "      <td>0.882845</td>\n",
       "      <td>7420</td>\n",
       "      <td>108526</td>\n",
       "      <td>101106</td>\n",
       "      <td>0.740076</td>\n",
       "      <td>0.722320</td>\n",
       "      <td>0.705363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0.658180</td>\n",
       "      <td>38</td>\n",
       "      <td>1</td>\n",
       "      <td>0.085113</td>\n",
       "      <td>3042.0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>30556</td>\n",
       "      <td>0.309495</td>\n",
       "      <td>0.218298</td>\n",
       "      <td>0.882845</td>\n",
       "      <td>7420</td>\n",
       "      <td>108526</td>\n",
       "      <td>101106</td>\n",
       "      <td>0.740076</td>\n",
       "      <td>0.722320</td>\n",
       "      <td>0.705363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0.233810</td>\n",
       "      <td>30</td>\n",
       "      <td>0</td>\n",
       "      <td>0.036050</td>\n",
       "      <td>3300.0</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>24972</td>\n",
       "      <td>0.156892</td>\n",
       "      <td>0.178405</td>\n",
       "      <td>0.630962</td>\n",
       "      <td>7420</td>\n",
       "      <td>108526</td>\n",
       "      <td>101106</td>\n",
       "      <td>0.740076</td>\n",
       "      <td>0.722320</td>\n",
       "      <td>0.705363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0.907239</td>\n",
       "      <td>49</td>\n",
       "      <td>1</td>\n",
       "      <td>0.024926</td>\n",
       "      <td>63588.0</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>35145</td>\n",
       "      <td>0.201177</td>\n",
       "      <td>0.251082</td>\n",
       "      <td>0.588475</td>\n",
       "      <td>7420</td>\n",
       "      <td>108526</td>\n",
       "      <td>101106</td>\n",
       "      <td>0.740076</td>\n",
       "      <td>0.722320</td>\n",
       "      <td>0.705363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149995</th>\n",
       "      <td>0</td>\n",
       "      <td>0.040674</td>\n",
       "      <td>74</td>\n",
       "      <td>0</td>\n",
       "      <td>0.225131</td>\n",
       "      <td>2100.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>30556</td>\n",
       "      <td>0.309495</td>\n",
       "      <td>0.218298</td>\n",
       "      <td>0.882845</td>\n",
       "      <td>7420</td>\n",
       "      <td>108526</td>\n",
       "      <td>101106</td>\n",
       "      <td>0.740076</td>\n",
       "      <td>0.722320</td>\n",
       "      <td>0.705363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149996</th>\n",
       "      <td>0</td>\n",
       "      <td>0.299745</td>\n",
       "      <td>44</td>\n",
       "      <td>0</td>\n",
       "      <td>0.716562</td>\n",
       "      <td>5584.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>30556</td>\n",
       "      <td>0.309495</td>\n",
       "      <td>0.218298</td>\n",
       "      <td>0.882845</td>\n",
       "      <td>7420</td>\n",
       "      <td>108526</td>\n",
       "      <td>101106</td>\n",
       "      <td>0.740076</td>\n",
       "      <td>0.722320</td>\n",
       "      <td>0.705363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149997</th>\n",
       "      <td>0</td>\n",
       "      <td>0.246044</td>\n",
       "      <td>58</td>\n",
       "      <td>0</td>\n",
       "      <td>3870.000000</td>\n",
       "      <td>5400.0</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>25838</td>\n",
       "      <td>0.184121</td>\n",
       "      <td>0.184591</td>\n",
       "      <td>0.691873</td>\n",
       "      <td>7420</td>\n",
       "      <td>108526</td>\n",
       "      <td>101106</td>\n",
       "      <td>0.740076</td>\n",
       "      <td>0.722320</td>\n",
       "      <td>0.705363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149998</th>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>30</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>5716.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>30556</td>\n",
       "      <td>0.309495</td>\n",
       "      <td>0.218298</td>\n",
       "      <td>0.882845</td>\n",
       "      <td>7420</td>\n",
       "      <td>108526</td>\n",
       "      <td>101106</td>\n",
       "      <td>0.740076</td>\n",
       "      <td>0.722320</td>\n",
       "      <td>0.705363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149999</th>\n",
       "      <td>0</td>\n",
       "      <td>0.850283</td>\n",
       "      <td>64</td>\n",
       "      <td>0</td>\n",
       "      <td>0.249908</td>\n",
       "      <td>8158.0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>35145</td>\n",
       "      <td>0.201177</td>\n",
       "      <td>0.251082</td>\n",
       "      <td>0.588475</td>\n",
       "      <td>1765</td>\n",
       "      <td>31522</td>\n",
       "      <td>29757</td>\n",
       "      <td>0.176042</td>\n",
       "      <td>0.212589</td>\n",
       "      <td>0.603269</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>150000 rows × 81 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        SeriousDlqin2yrs  RevolvingUtilizationOfUnsecuredLines  age  \\\n",
       "0                      1                              0.766127   45   \n",
       "1                      0                              0.957151   40   \n",
       "2                      0                              0.658180   38   \n",
       "3                      0                              0.233810   30   \n",
       "4                      0                              0.907239   49   \n",
       "...                  ...                                   ...  ...   \n",
       "149995                 0                              0.040674   74   \n",
       "149996                 0                              0.299745   44   \n",
       "149997                 0                              0.246044   58   \n",
       "149998                 0                              0.000000   30   \n",
       "149999                 0                              0.850283   64   \n",
       "\n",
       "        NumberOfTime30-59DaysPastDueNotWorse    DebtRatio  MonthlyIncome  \\\n",
       "0                                          2     0.802982         9120.0   \n",
       "1                                          0     0.121876         2600.0   \n",
       "2                                          1     0.085113         3042.0   \n",
       "3                                          0     0.036050         3300.0   \n",
       "4                                          1     0.024926        63588.0   \n",
       "...                                      ...          ...            ...   \n",
       "149995                                     0     0.225131         2100.0   \n",
       "149996                                     0     0.716562         5584.0   \n",
       "149997                                     0  3870.000000         5400.0   \n",
       "149998                                     0     0.000000         5716.0   \n",
       "149999                                     0     0.249908         8158.0   \n",
       "\n",
       "        NumberOfOpenCreditLinesAndLoans  NumberOfTimes90DaysLate  \\\n",
       "0                                    13                        0   \n",
       "1                                     4                        0   \n",
       "2                                     2                        1   \n",
       "3                                     5                        0   \n",
       "4                                     7                        0   \n",
       "...                                 ...                      ...   \n",
       "149995                                4                        0   \n",
       "149996                                4                        0   \n",
       "149997                               18                        0   \n",
       "149998                                4                        0   \n",
       "149999                                8                        0   \n",
       "\n",
       "        NumberRealEstateLoansOrLines  NumberOfTime60-89DaysPastDueNotWorse  \\\n",
       "0                                  6                                     0   \n",
       "1                                  0                                     0   \n",
       "2                                  0                                     0   \n",
       "3                                  0                                     0   \n",
       "4                                  1                                     0   \n",
       "...                              ...                                   ...   \n",
       "149995                             1                                     0   \n",
       "149996                             1                                     0   \n",
       "149997                             1                                     0   \n",
       "149998                             0                                     0   \n",
       "149999                             2                                     0   \n",
       "\n",
       "        ...  good_bin_NumberOfOpenCreditLinesAndLoans  \\\n",
       "0       ...                                     25838   \n",
       "1       ...                                     30556   \n",
       "2       ...                                     30556   \n",
       "3       ...                                     24972   \n",
       "4       ...                                     35145   \n",
       "...     ...                                       ...   \n",
       "149995  ...                                     30556   \n",
       "149996  ...                                     30556   \n",
       "149997  ...                                     25838   \n",
       "149998  ...                                     30556   \n",
       "149999  ...                                     35145   \n",
       "\n",
       "       margin_bad_bin_NumberOfOpenCreditLinesAndLoans  \\\n",
       "0                                            0.184121   \n",
       "1                                            0.309495   \n",
       "2                                            0.309495   \n",
       "3                                            0.156892   \n",
       "4                                            0.201177   \n",
       "...                                               ...   \n",
       "149995                                       0.309495   \n",
       "149996                                       0.309495   \n",
       "149997                                       0.184121   \n",
       "149998                                       0.309495   \n",
       "149999                                       0.201177   \n",
       "\n",
       "       margin_good_bin_NumberOfOpenCreditLinesAndLoans  \\\n",
       "0                                             0.184591   \n",
       "1                                             0.218298   \n",
       "2                                             0.218298   \n",
       "3                                             0.178405   \n",
       "4                                             0.251082   \n",
       "...                                                ...   \n",
       "149995                                        0.218298   \n",
       "149996                                        0.218298   \n",
       "149997                                        0.184591   \n",
       "149998                                        0.218298   \n",
       "149999                                        0.251082   \n",
       "\n",
       "       woe_bin_NumberOfOpenCreditLinesAndLoans  \\\n",
       "0                                     0.691873   \n",
       "1                                     0.882845   \n",
       "2                                     0.882845   \n",
       "3                                     0.630962   \n",
       "4                                     0.588475   \n",
       "...                                        ...   \n",
       "149995                                0.882845   \n",
       "149996                                0.882845   \n",
       "149997                                0.691873   \n",
       "149998                                0.882845   \n",
       "149999                                0.588475   \n",
       "\n",
       "       bad_bin_NumberRealEstateLoansOrLines  \\\n",
       "0                                       841   \n",
       "1                                      7420   \n",
       "2                                      7420   \n",
       "3                                      7420   \n",
       "4                                      7420   \n",
       "...                                     ...   \n",
       "149995                                 7420   \n",
       "149996                                 7420   \n",
       "149997                                 7420   \n",
       "149998                                 7420   \n",
       "149999                                 1765   \n",
       "\n",
       "       all_bin_NumberRealEstateLoansOrLines  \\\n",
       "0                                      9952   \n",
       "1                                    108526   \n",
       "2                                    108526   \n",
       "3                                    108526   \n",
       "4                                    108526   \n",
       "...                                     ...   \n",
       "149995                               108526   \n",
       "149996                               108526   \n",
       "149997                               108526   \n",
       "149998                               108526   \n",
       "149999                                31522   \n",
       "\n",
       "       good_bin_NumberRealEstateLoansOrLines  \\\n",
       "0                                       9111   \n",
       "1                                     101106   \n",
       "2                                     101106   \n",
       "3                                     101106   \n",
       "4                                     101106   \n",
       "...                                      ...   \n",
       "149995                                101106   \n",
       "149996                                101106   \n",
       "149997                                101106   \n",
       "149998                                101106   \n",
       "149999                                 29757   \n",
       "\n",
       "       margin_bad_bin_NumberRealEstateLoansOrLines  \\\n",
       "0                                         0.083882   \n",
       "1                                         0.740076   \n",
       "2                                         0.740076   \n",
       "3                                         0.740076   \n",
       "4                                         0.740076   \n",
       "...                                            ...   \n",
       "149995                                    0.740076   \n",
       "149996                                    0.740076   \n",
       "149997                                    0.740076   \n",
       "149998                                    0.740076   \n",
       "149999                                    0.176042   \n",
       "\n",
       "       margin_good_bin_NumberRealEstateLoansOrLines  \\\n",
       "0                                          0.065091   \n",
       "1                                          0.722320   \n",
       "2                                          0.722320   \n",
       "3                                          0.722320   \n",
       "4                                          0.722320   \n",
       "...                                             ...   \n",
       "149995                                     0.722320   \n",
       "149996                                     0.722320   \n",
       "149997                                     0.722320   \n",
       "149998                                     0.722320   \n",
       "149999                                     0.212589   \n",
       "\n",
       "       woe_bin_NumberRealEstateLoansOrLines  \n",
       "0                                  0.827981  \n",
       "1                                  0.705363  \n",
       "2                                  0.705363  \n",
       "3                                  0.705363  \n",
       "4                                  0.705363  \n",
       "...                                     ...  \n",
       "149995                             0.705363  \n",
       "149996                             0.705363  \n",
       "149997                             0.705363  \n",
       "149998                             0.705363  \n",
       "149999                             0.603269  \n",
       "\n",
       "[150000 rows x 81 columns]"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_woe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T07:07:35.986527Z",
     "start_time": "2020-12-13T07:07:35.446497Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feature</th>\n",
       "      <th>bin</th>\n",
       "      <th>woe</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>RevolvingUtilizationOfUnsecuredLines</td>\n",
       "      <td>(0.699, 50708.0]</td>\n",
       "      <td>1.495914</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>RevolvingUtilizationOfUnsecuredLines</td>\n",
       "      <td>(0.271, 0.699]</td>\n",
       "      <td>0.720083</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>RevolvingUtilizationOfUnsecuredLines</td>\n",
       "      <td>(0.0832, 0.271]</td>\n",
       "      <td>0.350952</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>RevolvingUtilizationOfUnsecuredLines</td>\n",
       "      <td>(-0.001, 0.0192]</td>\n",
       "      <td>0.243890</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>RevolvingUtilizationOfUnsecuredLines</td>\n",
       "      <td>(0.0192, 0.0832]</td>\n",
       "      <td>0.211221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(1.0, 2.0]</td>\n",
       "      <td>1.797837</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>0.572521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(2.0, 3.0]</td>\n",
       "      <td>2.151185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>183</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(3.0, 4.0]</td>\n",
       "      <td>2.429111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(4.0, 5.0]</td>\n",
       "      <td>2.520613</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>251</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(6.0, 7.0]</td>\n",
       "      <td>2.774776</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>423</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(9.0, inf]</td>\n",
       "      <td>2.902860</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1052</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(5.0, 6.0]</td>\n",
       "      <td>2.812612</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6909</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(7.0, 8.0]</td>\n",
       "      <td>2.024184</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10822</th>\n",
       "      <td>NumberOfTime30-59DaysPastDueNotWorse</td>\n",
       "      <td>(8.0, 9.0]</td>\n",
       "      <td>2.077007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>0.608707</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(2.0, 3.0]</td>\n",
       "      <td>2.998746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>186</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(1.0, 2.0]</td>\n",
       "      <td>2.701853</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1298</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(4.0, 5.0]</td>\n",
       "      <td>3.224503</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1713</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(3.0, 4.0]</td>\n",
       "      <td>3.379582</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1733</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(9.0, inf]</td>\n",
       "      <td>2.878935</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2910</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(8.0, 9.0]</td>\n",
       "      <td>3.691154</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3400</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(5.0, 6.0]</td>\n",
       "      <td>3.088387</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3929</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(6.0, 7.0]</td>\n",
       "      <td>4.140397</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5684</th>\n",
       "      <td>NumberOfTimes90DaysLate</td>\n",
       "      <td>(7.0, 8.0]</td>\n",
       "      <td>3.580814</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(-inf, 1.0]</td>\n",
       "      <td>0.645352</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>186</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(1.0, 2.0]</td>\n",
       "      <td>2.712133</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>423</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(4.0, 5.0]</td>\n",
       "      <td>3.159234</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1146</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(2.0, 3.0]</td>\n",
       "      <td>2.955438</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1733</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(9.0, inf]</td>\n",
       "      <td>2.886833</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2406</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(3.0, 4.0]</td>\n",
       "      <td>3.164917</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6664</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(5.0, 6.0]</td>\n",
       "      <td>3.758483</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16642</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(6.0, 7.0]</td>\n",
       "      <td>2.915139</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23964</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(7.0, 8.0]</td>\n",
       "      <td>2.705454</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68976</th>\n",
       "      <td>NumberOfTime60-89DaysPastDueNotWorse</td>\n",
       "      <td>(8.0, 9.0]</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>age</td>\n",
       "      <td>(40.0, 50.0]</td>\n",
       "      <td>0.813822</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>age</td>\n",
       "      <td>(25.0, 40.0]</td>\n",
       "      <td>0.955231</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>age</td>\n",
       "      <td>(70.0, inf]</td>\n",
       "      <td>0.279404</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>age</td>\n",
       "      <td>(50.0, 60.0]</td>\n",
       "      <td>0.651655</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>age</td>\n",
       "      <td>(60.0, 70.0]</td>\n",
       "      <td>0.406848</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>age</td>\n",
       "      <td>(-inf, 25.0]</td>\n",
       "      <td>1.013134</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    feature               bin       woe\n",
       "0      RevolvingUtilizationOfUnsecuredLines  (0.699, 50708.0]  1.495914\n",
       "2      RevolvingUtilizationOfUnsecuredLines    (0.271, 0.699]  0.720083\n",
       "3      RevolvingUtilizationOfUnsecuredLines   (0.0832, 0.271]  0.350952\n",
       "11     RevolvingUtilizationOfUnsecuredLines  (-0.001, 0.0192]  0.243890\n",
       "14     RevolvingUtilizationOfUnsecuredLines  (0.0192, 0.0832]  0.211221\n",
       "0      NumberOfTime30-59DaysPastDueNotWorse        (1.0, 2.0]  1.797837\n",
       "1      NumberOfTime30-59DaysPastDueNotWorse       (-inf, 1.0]  0.572521\n",
       "13     NumberOfTime30-59DaysPastDueNotWorse        (2.0, 3.0]  2.151185\n",
       "183    NumberOfTime30-59DaysPastDueNotWorse        (3.0, 4.0]  2.429111\n",
       "191    NumberOfTime30-59DaysPastDueNotWorse        (4.0, 5.0]  2.520613\n",
       "251    NumberOfTime30-59DaysPastDueNotWorse        (6.0, 7.0]  2.774776\n",
       "423    NumberOfTime30-59DaysPastDueNotWorse        (9.0, inf]  2.902860\n",
       "1052   NumberOfTime30-59DaysPastDueNotWorse        (5.0, 6.0]  2.812612\n",
       "6909   NumberOfTime30-59DaysPastDueNotWorse        (7.0, 8.0]  2.024184\n",
       "10822  NumberOfTime30-59DaysPastDueNotWorse        (8.0, 9.0]  2.077007\n",
       "0                   NumberOfTimes90DaysLate       (-inf, 1.0]  0.608707\n",
       "13                  NumberOfTimes90DaysLate        (2.0, 3.0]  2.998746\n",
       "186                 NumberOfTimes90DaysLate        (1.0, 2.0]  2.701853\n",
       "1298                NumberOfTimes90DaysLate        (4.0, 5.0]  3.224503\n",
       "1713                NumberOfTimes90DaysLate        (3.0, 4.0]  3.379582\n",
       "1733                NumberOfTimes90DaysLate        (9.0, inf]  2.878935\n",
       "2910                NumberOfTimes90DaysLate        (8.0, 9.0]  3.691154\n",
       "3400                NumberOfTimes90DaysLate        (5.0, 6.0]  3.088387\n",
       "3929                NumberOfTimes90DaysLate        (6.0, 7.0]  4.140397\n",
       "5684                NumberOfTimes90DaysLate        (7.0, 8.0]  3.580814\n",
       "0      NumberOfTime60-89DaysPastDueNotWorse       (-inf, 1.0]  0.645352\n",
       "186    NumberOfTime60-89DaysPastDueNotWorse        (1.0, 2.0]  2.712133\n",
       "423    NumberOfTime60-89DaysPastDueNotWorse        (4.0, 5.0]  3.159234\n",
       "1146   NumberOfTime60-89DaysPastDueNotWorse        (2.0, 3.0]  2.955438\n",
       "1733   NumberOfTime60-89DaysPastDueNotWorse        (9.0, inf]  2.886833\n",
       "2406   NumberOfTime60-89DaysPastDueNotWorse        (3.0, 4.0]  3.164917\n",
       "6664   NumberOfTime60-89DaysPastDueNotWorse        (5.0, 6.0]  3.758483\n",
       "16642  NumberOfTime60-89DaysPastDueNotWorse        (6.0, 7.0]  2.915139\n",
       "23964  NumberOfTime60-89DaysPastDueNotWorse        (7.0, 8.0]  2.705454\n",
       "68976  NumberOfTime60-89DaysPastDueNotWorse        (8.0, 9.0]  0.000000\n",
       "0                                       age      (40.0, 50.0]  0.813822\n",
       "1                                       age      (25.0, 40.0]  0.955231\n",
       "5                                       age       (70.0, inf]  0.279404\n",
       "6                                       age      (50.0, 60.0]  0.651655\n",
       "15                                      age      (60.0, 70.0]  0.406848\n",
       "19                                      age      (-inf, 25.0]  1.013134"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#得到woe的规则\n",
    "df_bin_to_woe= pd.DataFrame(columns=['feature','bin','woe'])\n",
    "for f in feature_cols:\n",
    "    b = 'bin_'+f\n",
    "    w = 'woe_bin_'+f\n",
    "    df= df_woe[[w,b]].drop_duplicates()\n",
    "    df.columns=['woe','bin']\n",
    "    #print(df)\n",
    "    df['feature']=f\n",
    "    df_bin_to_woe = pd.concat([df_bin_to_woe,df])\n",
    "    \n",
    "df_bin_to_woe"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T06:57:11.326660Z",
     "start_time": "2020-12-13T06:57:11.318660Z"
    }
   },
   "source": [
    "## 逻辑回归建模"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T07:07:39.398723Z",
     "start_time": "2020-12-13T07:07:39.386722Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['woe_bin_age',\n",
       " 'woe_bin_NumberOfDependents',\n",
       " 'woe_bin_NumberOfTime30-59DaysPastDueNotWorse',\n",
       " 'woe_bin_NumberOfTime60-89DaysPastDueNotWorse',\n",
       " 'woe_bin_NumberOfTimes90DaysLate',\n",
       " 'woe_bin_RevolvingUtilizationOfUnsecuredLines',\n",
       " 'woe_bin_DebtRatio',\n",
       " 'woe_bin_MonthlyIncome',\n",
       " 'woe_bin_NumberOfOpenCreditLinesAndLoans',\n",
       " 'woe_bin_NumberRealEstateLoansOrLines']"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "woe_cols = [c for c in list(df_woe.columns.values) if 'woe' in c]\n",
    "woe_cols"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T07:07:41.186825Z",
     "start_time": "2020-12-13T07:07:41.075819Z"
    }
   },
   "outputs": [],
   "source": [
    "#数据集切分\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "x_train,x_test, y_train, y_test = train_test_split(df_woe[woe_cols],df_woe['SeriousDlqin2yrs'], test_size=0.2, random_state=30)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-13T07:10:02.276895Z",
     "start_time": "2020-12-13T07:09:59.223720Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9383333333333334\n",
      "0.7925881795921057\n"
     ]
    }
   ],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.metrics import accuracy_score,roc_auc_score\n",
    "\n",
    "model= LogisticRegression(random_state=33).fit(x_train,y_train)\n",
    "y_pred = model.predict(x_test)\n",
    "\n",
    "print(accuracy_score(y_pred, y_test))\n",
    "print(roc_auc_score(y_pred, y_test))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
